Locally Weighted Least Squares Temporal Difference Learning

نویسندگان

  • Matthew Howard
  • Yoshihiko Nakamura
چکیده

This paper introduces locally weighted temporal difference learning for evaluation of a class of policies whose value function is nonlinear in the state. Least squares temporal difference learning is used for training local models according to a distance metric in state-space. Empirical evaluations are reported demonstrating learning performance on a number of strongly non-linear value functions, without the need for prior knowledge of features or a specific functional form.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Derivative estimation based on difference sequence via locally weighted least squares regression

A new method is proposed for estimating derivatives of a nonparametric regression function. By applying Taylor expansion technique to a derived symmetric difference sequence, we obtain a sequence of approximate linear regression representation in which the derivative is just the intercept term. Using locally weighted least squares, we estimate the derivative in the linear regression model. The ...

متن کامل

Co-learning with a locally weighted partial least squares for soft sensors of nonlinear processes

A method to improve adaptivity of soft sensors is investigated in this paper. Soft sensors have become very important in the chemical industry to achieve a highly efficient, high-quality and safe production system. Among the various methods, partial least squares (PLS) method is the most used for soft sensors. In this research, a co-learning style locally weighted PLS method which utilizes a se...

متن کامل

Imitation-based Learning of Bipedal Walking Using Locally Weighted Learning

Walking is an extremely challenging problem due to its dynamically unstable nature. It is further complicated by the high dimensional continuous state and action spaces. We use locally weighted projection regression (LWPR) as a locally structurally adaptive nonlinear function approximator as the basis for learned control policies. Empirical evidence suggests that control policies for high dimen...

متن کامل

Sustainable ℓ2-regularized actor-critic based on recursive least-squares temporal difference learning

Least-squares temporal difference learning (LSTD) has been used mainly for improving the data efficiency of the critic in actor-critic (AC). However, convergence analysis of the resulted algorithms is difficult when policy is changing. In this paper, a new AC method is proposed based on LSTD under discount criterion. The method comprises two components as the contribution: (1) LSTD works in an ...

متن کامل

Ensembles of extreme learning machine networks for value prediction

Value prediction is an important subproblem of several reinforcement learning (RL) algorithms. In a previous work, it has been shown that the combination of least-squares temporal-difference learning with ELM (extreme learning machine) networks is a powerful method for value prediction in continuous-state problems. This work proposes the use of ensembles to improve the approximation capabilitie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013